Computational Biology and Chemistry
○ Elsevier BV
Preprints posted in the last 30 days, ranked by how well they match Computational Biology and Chemistry's content profile, based on 23 papers previously published here. The average preprint has a 0.02% match score for this journal, so anything above that is already an above-average fit.
Patel, A.; Patel, V.; Lotia, S.; Patel, K.; Mandlik, D.; Tan, J.; Sampath, P.; Patel, B.; Johar, K.; Bhatia, D. D.; Tanavde, V.; Patel, S.
Show abstract
BackgroundChemo-resistance remains a major clinical challenge in Oral Squamous Cell Carcinoma (OSCC), attributed to the intrinsically resistant cells. Although tumour-derived extracellular vesicles (EVs) have been implicated in cell-cell communication, their role in propagating chemo-resistance remains poorly defined. This study aims to identify salivary EV-associated miRNAs capable of predicting chemoresistance and to delineate the role of miR-1307-5p in modulating CSC-driven therapeutic refractoriness. MethodsSalivary EV-derived expression profile of miR-1307-5p was assessed by qPCR in chemo resistant OSCC patients and further validated in TCGA small RNA sequencing datasets. Expression was validated by qPCR and correlated with clinicopathological outcomes. Functional assays including cell-cycle analysis, apoptosis, migration/invasion, 3D spheroids, angiogenesis, and CAM assays were performed in miR-1307-5p inhibited CD44 CSC subpopulation compared to its vehicular control. Transcriptomic profiling cross-referencing with TCGA was conducted to identify potential novel targets of miR-1307-5p. Chemo-sensitisation was assessed by treating the knockdown chemo resistant cells with low dose cisplatin and validating it using in-vitro functional assays and orthotopic xenograft model. ResultsmiR-1307-5p was significantly elevated in salivary EVs of chemo resistant OSCC patients and correlated with poor overall survival (p = 0.03). The miRNA was markedly enriched in endogenously resistant CD44 CSCs. Silencing of miR-1307-5p induced G2/M arrest, triggered apoptosis, impaired invasion, and reduced angiogenesis both in-vitro and in ex-vivo assays. Transcriptomic profiling, TCGA validation, and integrative pathway analysis identified key oncogenic hubs which converge on PI3K-AKT, MAPK/ERK, and YAP signalling pathways governing EMT. Inhibition of miR-1307-5p restored cisplatin sensitivity in resistant CSCs, with low-dose cisplatin producing substantial tumour suppression in-vitro and in-vivo. Reduced CD44 expression in xenograft models confirmed CSC reprogramming. EVs from anti-miR-treated cells confer chemo sensitisation upon uptake by resistant CSCs. Xenograft models substantiated that EVs can initiate tumour formation and that EV-mediated delivery of anti-miR-1307-5p drives significant tumour regression. ConclusionThis study identifies salivary EV-derived miR-1307-5p as a clinically relevant biomarker of chemoresistance in OSCC and reveals its mechanistic role in sustaining CSC-driven therapeutic failure. Targeting miR-1307-5p offers a promising avenue for restoring cisplatin sensitivity and developing exosome-based therapeutic strategies. Graphical Abstract O_FIG O_LINKSMALLFIG WIDTH=200 HEIGHT=150 SRC="FIGDIR/small/709730v1_ufig1.gif" ALT="Figure 1"> View larger version (38K): org.highwire.dtl.DTLVardef@19f88e0org.highwire.dtl.DTLVardef@d36b95org.highwire.dtl.DTLVardef@3c2579org.highwire.dtl.DTLVardef@c04ef5_HPS_FORMAT_FIGEXP M_FIG C_FIG
ding, y.; lu, t.; Li, y.
Show abstract
Liquid-liquid phase separation (LLPS) of biomacromolecules is a key mechanism driving the formation of membraneless organelles (MLOs) within cells, playing a crucial role in fundamental biological processes such as cell proliferation and stress response. Accurately understanding and predicting the phase separation propensity of proteins is essential for unraveling the assembly mechanisms of MLOs and their functions under both physiological and pathological conditions. Traditional research methods primarily rely on biochemical experiments, which are limited by low throughput, high cost, and difficulty in systematically exploring sequence-phase transition relationships. This study proposes and implements a novel three-stage, iterative paradigm based on artificial intelligence (AI) to propel phase separation research towards systematization, predictability, and mechanistic understanding. O_LIBenchmark Model Construction: A preliminary predictive model was established based on a Multilayer Perceptron (MLP) neural network, and the driving effect of phenylalanine/tyrosine (F/Y) residue-mediated {pi}-{pi} interactions on LLPS was validated. C_LIO_LIModel Robustness Enhancement: The model was optimized through adversarial training strategies, which effectively identified and eliminated misclassifications of "highly disordered non-phase-separating" trap sequences. This significantly improved the models generalization capability and reliability when handling complex, real-world sequences. C_LIO_LIPhysical Mechanism Integration and Functional Expansion: Incorporating the Uniform Manifold Approximation and Projection (UMAP) manifold learning method and constraints from non-equilibrium thermodynamics, a "fingerprint space" capable of characterizing the thermodynamic behavior of phase separation was constructed. This space enables cluster analysis of different MLO types, and the model can output a thermodynamic stability score for protein phase separation. Based on this score, we identified 10 high-confidence candidate proteins with the potential to form novel MLOs. The paradigm established in this study upgrades phase separation prediction from the traditional "binary classification" approach to a novel research framework characterized by "physical mechanism analysis + novel MLO discovery." It provides the phase separation field with a computational tool that combines high accuracy, strong robustness, and good physical interpretability. C_LI
Jones, D.; Wu, Y.
Show abstract
Intrinsically disordered proteins (IDPs) mediate many cellular functions through interactions with structured protein partners, but predicting the corresponding binding sites on the structured partner remains challenging. Here, we present IDBSpred, a sequence-based method for residue-level prediction of IDP-binding sites on structured proteins. Training and test data were collected from the DIBS database, which contains more than 700 non-redundant IDP-protein complexes. Residue-level embeddings of structured partner sequences were generated using the ESM-2 protein language model and used as input to a multilayer perceptron classifier for binary prediction of binding versus non-binding residues. Analysis of amino acid composition showed that IDP-binding sites are enriched in aromatic residues, especially Trp, Tyr, and Phe, as well as several charged and polar residues, whereas Ala and several small or conformationally restrictive residues are depleted. The classifier achieved an ROC AUC of 0.87 and an average precision of 0.61. Structural case studies further showed that the predicted sites largely recapitulate the major experimentally defined binding interfaces. These results demonstrate that protein language model embeddings plus machine learning algorithms can effectively capture sequence features associated with IDP recognition on structured proteins. IDBSpred provides a practical framework for studying IDP-mediated interfaces and identifying potential therapeutic hotspots.
Harada, M.; Tabara, M.; Kuriyama, K.; Ito, K.; Bono, H.; Sakamoto, T.; Nakano, M.; Fukuhara, T.; Toyoda, A.; Fujiyama, A.; Tabunoki, H.
Show abstract
MicroRNAs (miRNAs) play essential roles in the posttranscriptional regulation of gene expression in organisms. In the process of synthesizing mature miRNAs from miRNA precursors, the miRNA precursors are cleaved via Dicer at their loop structure, after which the miRNA precursors become mature and regulate transcription. However, the consequences of altering the loop sequence are not fully understood. The silkworm Bombyx mori is a lepidopteran insect with many genetic strains. We identified a mutant of the miRNA miR-3260 whose the part of the loop structure was lacking in a silkworm strain with translucent larval skin. Here, we aimed to analyze the role of wild-type miR-3260 and the influence of the mutation of the loop structure in B. mori. First, we identified the genomic region responsible for the translucent larval skin phenotype and determined that the mutated miR-3260 nucleotide sequences. Then, we predicted the binding partners of wild-type miR-3260 using the RNA hybrid tool and found two juvenile hormone (JH)-related genes as targets of wild-type miR-3260. Next, we assessed the relationships between miR-3260 and JH and found that miR-3260 was highly expressed in the Corpora allata and its expression responded to JH treatment. Meanwhile, miR-3260 mimic and inhibitor did not induce the typical phenotypes associated with JH in B. mori. Then, we compared the dicing products from wild-type and mutant miR-3260 precursors and observed that neither form underwent Dicer-mediated cleavage when the loop structure was altered. These results suggest that loop mutations in the miR-3260 precursor may not influence dicing activity, consistent with the lack of observable phenotypic effects.
Jiang, X.; Luo, Y.; Azad, M. A. K.; Xu, L.; Xiao, M.; Velkov, T.; Roberts, K. D.; Thamlikitkul, V.; Zhou, Q. T.; Zhou, F.; Li, J.
Show abstract
BackgroundMultidrug-resistant (MDR) Gram-negative bacteria have triggered a critical global health crisis. Polymyxin lipopeptide antibiotics are used as a last-line therapy against these problematic pathogens, but their clinical use is largely limited by severe nephrotoxicity. Human oligopeptide transporter 2 (hPepT2) is a membrane transporter mediating the reabsorption of polymyxins in renal proximal tubular cells, substantially contributing to their nephrotoxicity. However, it remains unclear how polymyxins interact with hPepT2. MethodsIn this study, we investigated the structure-interaction relationship (SIR) of polymyxins with hPepT2 by integrating computational, chemical and cell biology approaches. Bioinformatic modelling predicted the residues essential for the binding of polymyxins with hPepT2. Transporter mutagenesis and molecular analysis were employed to explore the role of each residue in the interaction of hPepT2 and polymyxins. Moreover, we synthesised a series of polymyxin-like analogues with altering the moieties that are critical for binding with hPepT2. The antibacterial activity and nephrotoxicity of these analogues were subsequently assessed. ResultsOur bioinformatic modelling proposed an outward-facing structure of hPepT2 with a possible transport pathway that polymyxins bind to the lateral opening site of hPepT2 (e.g. E214, D215, D317, D342, E622). Molecular assays for transporter function and expression confirmed that D215 residue of hPepT2 is critical for polymyxin binding, while several other residues significantly impact on transporter turnover rate and/or protein expression. Our experimental validations showed that the lipopeptide analogues with altering the Dab1, Dab3, Dab5 and Dab9 moieties of polymyxins demonstrated decreased interactions with hPepT2. Among these synthetic analogues, alanine substitution at Dab3 showed reduced nephrotoxicity in mice while reserved antibacterial activity against a range of bacterial strains. ConclusionsOverall, this proof-of-concept study demonstrated that the computationally predicted and experimentally validated polymyxin-hPepT2 SIR model provides a viable approach for the discovery of novel, safer lipopeptide antibiotics.
ye, w.; Jiang, X.; Shen, F.
Show abstract
ObjectiveAiming at the core problems prevalent in biomedical research, including the "translational distance", the difficulty in aligning cross-scale studies, and the lack of direct validation of single-cell systems biology models in human samples, this study aims to verify whether the results of transcriptome-wide Mendelian randomization (TWMR) based on large-scale populations are consistent with the causal inference results of deep learning combined with double machine learning (DML) using single-cell transcriptome data from human samples, to clarify whether statistical biology and systems biology can converge to the same biological truth, and provide methodological support for mechanism dissection and precision medicine research of complex diseases such as rheumatoid arthritis (RA). MethodsThis study integrated multi-omics data to conduct a two-stage causal inference and cross-scale validation analysis. In the first stage, based on the summary statistics of RA genome-wide association study (GWAS) from 456,348 individuals of European ancestry in the UK Biobank (UKB), and cis-expression quantitative trait locus (cis-eQTL) data from 31,684 individuals in the eQTLGen Consortium, a two-sample Mendelian randomization approach was adopted. Transcriptome-wide causal effect analysis was performed using the inverse-variance weighted (IVW) method, MR Egger regression, and weighted median method, and gene-level causal effect values were obtained after strict quality control and multiple testing correction. In the second stage, based on single-cell RNA sequencing (scRNA-seq) data from RA patients and healthy controls (RA group: 11 samples, 211,867 cells; Healthy control group: 38 samples, 456,631 cells), after preprocessing via the Seurat pipeline, batch effect correction, and cell type annotation, a hierarchical deep neural network was constructed to complete feature compression of high-dimensional expression data, and the DML framework was used to estimate the causal effects of genes on RA disease status. Finally, Pearson correlation analysis was performed to conduct cell type-specific cross-scale validation of gene-level causal effect values obtained by the two methods, and the validated model was used to quantify the causal effects of 16 RA-related pathways from the Reactome database. ResultsThis study confirmed that the gene causal effect values obtained from large-scale population TWMR analysis were significantly correlated with those calculated by the deep learning combined with DML model based on single-cell transcriptome data. Among them, the correlation was extremely significant (p<0.001) in core naive B cells (r=0.202, p=3.2e-05, n=414) and core naive CD4 T cells (r=0.102, p=0.037, n=412). The validated DML model successfully quantified the cell type-specific causal effect values of 16 RA-related signaling pathways. ConclusionStatistical biology and systems biology can converge to the same biological truth. The cross-scale consistency between the two can significantly shorten the "translational distance" in biomedical research, and realizes the direct validation of the single-cell systems biology causal model of human samples based on large-scale population genetic data, getting rid of the excessive dependence on animal/cell experimental models in traditional research. This research paradigm not only provides a new path for mechanism dissection and therapeutic target screening of complex diseases such as RA, but also provides a feasible solution for rare disease research to break through the limitation of GWAS sample size, and lays an important theoretical and methodological foundation for constructing standardized systems biology models of human complex diseases and promoting the development of precision medicine.
He, Z.; Li, Y.; Shkurat, T. P.; Butenko, E. V.; Derevyanchuk, E. G.; Lomteva, S. V.; Chen, L.; Lipovich, L.
Show abstract
BackgroundPolycystic ovary syndrome (PCOS) is a prevalent endocrine disorder and a leading cause of female infertility, with complex genetic, metabolic, and hormonal etiologies. Long non-coding RNAs (lncRNAs) have emerged as important regulators of diverse biological processes, yet their roles in PCOS remain underexplored. Here, we identified and characterized PCOS differentially expressed gene-associated lncRNAs (PDEGAL) with an integrative approach combining expression data, genetic association, and evolutionary analysis. MethodsThirty-three PCOS-associated protein-coding genes were obtained from our prior study, and all their nearby and overlapping lncRNAs were annotated. These candidates were analyzed using UCSC Genome Browser-mapped annotations and datasets, including NCBI RefSeq, GENCODE, GTEx, GWAS SNPs, and conservation, as well as the FANTOM5 cap analysis of gene expression (CAGE) promoter data, to assess their expression, regulatory potential, genetic variant overlaps, and evolutionary conservation. ResultsTwenty-three PDEGALs (18 antisense to, and 5 sharing bidirectional promoters with, known PCOS-associated protein-coding genes) were identified. 17 PDEGALs contained GWAS SNPs with statistically significant disease associations, 9 of which were associated with PCOS-related traits. 5 PDEGALs demonstrated expression in the KGN granulosa cell model of PCOS. Key gene structure element (KGSE) analysis revealed that most PDEGALs are primate-specific. Integrating four criteria--GTEx expression, GWAS SNPs, FANTOM promoterome, and KGSE conservation--highlighted HELLPAR as the only lncRNA fulfilling all four, while five others--PGR-AS1, MTOR-AS1, ENSG00000265179, ENSG00000256218, and LOC105377276--fulfilled three of the four criteria. ConclusionsWe have systematically identified candidate PCOS regulatory lncRNAs with convergent genetic, expression, and evolutionary evidence. These results provide a framework for functional validation and highlight lncRNAs as potential biomarkers and therapeutic targets in PCOS that function by regulating their nearby and overlapping protein-coding genes.
Fletcher, W. L.; Sinha, S.
Show abstract
The practices of identifying biomarkers and developing prognostic models using genomic data has become increasingly prevalent. Such data often features characteristics that make these practices difficult, namely high dimensionality, correlations between predictors, and sparsity. Many modern methods have been developed to address these problematic characteristics while performing feature selection and prognostic modeling, but a large-scale comparison of their performances in these tasks on diverse right-censored time to event data (aka survival time data) is much needed. We have compiled many existing methods, including some machine learning methods, several which have performed well in previous benchmarks, primarily for comparison in regards to variable selection capability, and secondarily for survival time prediction on many synthetic datasets with varying levels of sparsity, correlation between predictors, and signal strength of informative predictors. For illustration, we have also performed multiple analyses on a publicly available and widely used cancer cohort from The Cancer Genome Atlas using these methods. We evaluated the methods through extensive simulation studies in terms of the false discovery rate, F1-score, concordance index, Brier score, root mean square error, and computation time. Of the methods compared, CoxBoost and the Adaptive LASSO performed well in all metrics, and the LASSO and elastic net excelled when evaluating concordance index and F1-score. The Benjamini-Hoschberg and q-value procedures showed volatile performances in controlling the false discovery rate. Some methods performances were greatly affected by differences in the data characteristics. With our extensive numerical study, we have identified the best performing methods for a plethora of data characteristics using informative metrics. This will help cancer researchers in choosing the best approach for their needs when working with genomic data.
Khan, H.; Garcia-Galindo, P.; Ahnert, S. E.; Dingle, K.
Show abstract
A morphospace is an abstract space of theoretically possible biological traits, shapes, or property values. It is interesting to explore which parts of a morphospace life occupies, as compared to those parts which could be occupied, but are not. Comparing random and natural non-coding (nc) RNA secondary structures is an established approach to studying morphospace occupation for RNA structures. Most earlier studies have focused on the minimum free energy (MFE) structure, while relatively few have looked at the Boltzmann distribution, describing the ensemble of energetically suboptimal RNA folds. These suboptimal structures may have important roles and functions, and hence should be examined carefully. Here we compare random and natural ncRNA in terms of their Boltzmann distributions, finding that natural RNA tend to have very similar profiles to random RNA, with the main difference being that natural RNA are slightly more energetically stable, except for very short sequences (20 to 30 nucleotides) which tend to be slightly less stable. We infer that natural ncRNA occupy similar parts of the morphospace that random RNA do, indicating that the biophysics of the genotype-phenotype map largely determines the ensemble properties of ncRNA.
Xu, Y.; Zhang, X.; Chen, W.; Li, Y.; Lu, L.; Huang, R.; Liao, J.; Li, H.; Zheng, W.
Show abstract
PurposeDifferentially expressed genes (DEGs) between colorectal cancer liver metastasis (CRLM) epithelium and primary colorectal cancer (CRC) epithelium (LMR DEGs) identified based on single-cell RNA sequencing (scRNA-seq) data may become new biomarkers for CRC prognosis. MethodsAn scRNA-seq dataset was used to describe the cellular landscape of primary CRC and CRLM and identify LMR DEGs. Prognostic LMR DEGs were identified in the bulk RNA-seq dataset. Based on the prognostic LMR DEGs, multiple machine learning algorithm combinations were compared in terms of their C-index, and the best model was selected for the construction of the LMR score. ResultsAmong the 2070 LMR DEGs, 426 prognostic LMR DEGs were ultimately obtained. The combination of the randomized survival forest (RSF) model and ridge regression had the highest C-index and was therefore used to construct a 15-gene scoring system (LMR score). In the external validation set, the 1- and 5-year AUCs of the LMR score were greater than those of the AJCC stage and other scoring systems constructed with a similar dataset. In addition, the LMR score was closely associated with factors that influence CRC outcomes, such as immune infiltration. ConclusionThe LMR score may be a reliable new biomarker for predicting the prognosis of patients with CRC.
Matsuda, K.; Moriya, Y.; Xu, L.; Ohmagari, R.; Aramaki, S.; Zhang, C.; Baba, A.; Hirayama, S.; Kahyo, T.; Setou, M.
Show abstract
Ubiquitin-like protein 3 (UBL3) is a post-translational modifier that sorts proteins into small extracellular vesicles and regulates the trafficking of disease-associated proteins such as -synuclein. The structural and dynamic features of the UBL domain that underlie these functions, however, remain poorly understood. Here we performed in silico structural dynamics analysis of the UBL3 UBL domain using an NMR structure ensemble combined with anisotropic network modeling (ANM) and perturbation response scanning (PRS). Principal component analysis and residue-wise fluctuation analysis consistently revealed high flexibility in the C-terminal region of UBL3. Comparative ANM analysis across 20 ubiquitin-like proteins (UBLs) further showed that C-terminal flexibility is a conserved yet variable property within the UBL family. PRS analysis demonstrated that residues forming the central -helix of the {beta}-grasp fold exert greater dynamic control over collective motions than {beta}-sheet residues. Notably, UBL3 displayed the highest helix/sheet PRS effectiveness ratio among all UBLs analyzed, highlighting the prominent dynamic contribution of helix residues in this domain. Together, these results provide a structural basis for understanding UBL3-dependent protein interactions and disease-related mechanisms, and suggest that helix-centered dynamic control in the UBL domain may represent a potential target for modulating UBL3 function.
Cui, T.; Wang, Z.; Wang, T.
Show abstract
AI-based molecular dynamics simulation brings ab initio calculations to biomolecules in an efficient way, in which the machine learning force field (MLFF) locates at the central position by accurately predicting the molecular energies and forces. Most existing MLFFs assume localized interatomic interactions, limiting their ability to accurately model non-local interactions, which are crucial in biomolecular dynamics. In this study, we introduce ViSNet-PIMA, which efficiently learns non-local interactions by physics-informed multipole aggregator (PIMA) and accurately encodes molecular geometric information. ViSNet-PIMA outperforms all state-of-the-art MLFFs for energy and force predictions of different kinds of biomolecules and various conformations on MD22 and AIMD-Chig datasets, while adapting the PIMA blocks into other MLFFs further achieves 55.1% performance gains, demonstrating the superiority of ViSNet-PIMA and the universality of the model design. Furthermore, we propose AI2BMD-PIMA to incorporate ViSNet-PIMA into AI2BMD simulation program by introducing "Transfer Learning-Pretraining-Finetuning" scheme and replacing molecular mechanics-based non-local calculations among protein fragments with ViSNet-PIMA, which reduces AI2BMDs energy and force calculation errors by more than 50% for different protein conformations and protein folding and unfolding processes. ViSNet-PIMA advances ab initio calculation for the entire biomolecules, amplifying the application values of AI-based molecular dynamics simulations and property calculations in biochemical research.
Misra, P.; Movva, N. S. V.; Shah, R.
Show abstract
Purpose/ObjectiveThis study aimed to design and computationally evaluate a synthetic GluN1-mimetic peptide as a decoy to bind and neutralize pathogenic autoantibodies in anti-NMDA receptor (NMDAR) encephalitis, a severe autoimmune neurological disorder affecting approximately 1.5 per million individuals annually. MethodsKey GluN1 epitope residues (351-390 of the amino-terminal domain) were identified from crystallographic evidence and patient-derived antibody binding studies. Multiple peptide variants were rationally designed to mimic the antibody-binding interface. AlphaFold2 was used to predict peptide structures. Rigid-body docking simulations were conducted with HADDOCK 2.4 to model peptide-antibody complexes, and binding affinities were quantified using PRODIGY. A scrambled peptide control was included to establish docking specificity. ResultsThe top-performing peptide demonstrated favorable predicted binding ({Delta}G = -21.5 kcal/mol, Kd = 1.7 x 10-{superscript 1} M) with an average pLDDT score of 90%, a buried surface area of 3,255.5 [A]{superscript 2}, and 18 intermolecular hydrogen bonds. Relative to the scrambled control ({Delta}G = -8.3 kcal/mol), the designed peptide showed substantially stronger predicted binding. Conclusion/ImplicationsThese results support the validity of an epitope-mimicry design strategy and establish a scalable computational framework for prioritizing peptide decoy candidates applicable to other antibody-mediated autoimmune disorders. Experimental validation remains necessary to confirm real-world efficacy.
Duarte, S. A.; Mehdiabadi, M.; Bugnon, L. A.; Aspromonte, M. C.; Piovesan, D.; Milone, D. H.; Tosatto, S.; Stegmayer, G.
Show abstract
Intrinsically disordered proteins (IDPs) play an important role in a wide range of biological functions and are linked to several diseases. Due to technical difficulties and the high cost of experimental determination of disorder in proteins, combined with the exponential increase of unannotated protein sequences, the development of computational methods for disorder prediction became an active area of research in the last few decades. In this work, we present emb2dis, a deep learning model that uses protein language models (pLMs) to predict disorder from sequence. The emb2dis tool is a pre-trained model that receives as input a protein sequence, calculates its pLM embedding and passes it to a deep learning model. In contrast to existing approaches, emb2dis integrates informative sequence representations with a novel architecture that combines residual networks (ResNets) and dilated convolutions. This design effectively enlarges the receptive field of the convolution operation, enabling the model to better capture an extended context of each amino acid. At the output, emb2dis assigns a disorder propensity score to each residue in the sequence. The model was evaluated on datasets from the latest CAID3 blind benchmark for disorder prediction, where it achieved first place in the Disorder-PDB category, exhibiting strong performance with high AUC and Fmax scores. Additionally, it ranked among the top ten methods on the Disorder-NOX dataset. We provide a freely available web-demo for emb2dis and a source code repository for local installation. Weblink for the toolhttps://sinc.unl.edu.ar/web-demo/emb2dis/ The importance of the emb2dis tool is that it provides a new deep learning approach and significant improvements in the prediction of protein disorder, with a simple web interface and graphical output detailing per-residue disorder.
Choudhary, S.; Guleria, V.
Show abstract
BackgroundThe most prevalent kind of oral cancer is oral squamous cell carcinoma (OSCC), which has a poor prognosis because of delayed detection and a lack of molecular indicators. MethodsTranscriptomic data from TCGA were analyzed to identify differentially expressed genes between OSCC and normal samples. Functional enrichment analysis was performed to determine biological pathways. A protein-protein interaction network was constructed using STRING and visualized in Cytoscape to identify hub genes. ResultsA total of 5732 differentially expressed genes were identified, including 2459 upregulated and 3273 downregulated genes. Network analysis revealed several highly connected hub genes such as CDK1, CCNB1, TOP2A, BUB1, and MMP9. Functional enrichment indicated significant involvement of cell cycle regulation and cancer-associated pathways. ConclusionThis integrative analysis identified key regulatory hub genes that may be involved in OSCC progression. These genes may serve as promising biomarkers and therapeutic targets for future studies.
Dervaux, J.; Brunet, P.
Show abstract
The growth of cultures and formation of mucilage blooms in reaction to salt stress of cyanobacterial cultures are investigated with a focus on the influence of pH. In non-buffered medium, cultures show their pH increasing from 6.5 just after inoculation, up to 11 during the exponential phase. We record the time-evolution of concentration and pH, with different initial OD0. In a second set of experiments, we extract the doubling time of the unbuffered cultures in comparison with those inoculated in pH-buffered BG11 media at four different pH from 6.3 to 10.5 : in the most acid media, all cultures die or grow very slowly. At pH = 10.5, we obtain the fastest growth for all four strains, allowing to qualify these cyanobacteria as being alkaliphiles, though for all strains with comparable initial OD0, the doubling time is shorter for unbuffered cultures. Following a previous study [31]), we finally investigate the influence of pH on mucilage formation and biomass uplift induced by salt stress, involving EPS floculation by cations. Our results show that operating in buffered media significantly influences the mucilage formation, though the observed regimes cannot be simply correlated to the pH value.
Ahmadov, A.; Ahmadov, O.
Show abstract
Bone morphogenetic protein receptor type IA (BMPR1A) is a key mediator of chondrogenesis and a validated therapeutic target for cartilage repair, yet existing BMP mimetic peptides suffer from low potency and the full-length protein (rhBMP-2) carries significant safety risks. Generative AI tools for protein design can now produce de novo peptide binders, but none have been applied to cartilage regeneration targets. Here, we benchmarked four architecturally distinct AI tools--RFdiffusion, BindCraft, PepMLM, and RFpeptides--to design candidate BMPR1A-binding peptides. We generated 192 candidates alongside 98 negative controls (290 total) and evaluated all complexes using AlphaFold 3 structure prediction, dual physics-based energy scoring (PyRosetta and FoldX), and contact recapitulation against the crystallographic BMP-2:BMPR1A interface (PDB: 1REW). A four-metric composite ranking identified a 15-residue PepMLM design (pepmlm_L15_0026) as the top candidate, combining favorable binding energy (PyRosetta dGseparated = -45.9 REU; FoldX {Delta}G = -19.4 kcal/mol) with the highest contact recapitulation among top-ranked peptides (11/30 gold-standard interface residues). Designed candidates significantly outperformed controls on ipTM (p = 0.002) and FoldX {Delta}G (p < 0.001). BindCraft candidates achieved the highest structural confidence (ipTM up to 0.81) but exhibited moderate contact recapitulation (mean 0.224), consistent with the computational hypothesis that they may engage alternative BMPR1A binding surfaces rather than the native BMP-2 interface. Physicochemical filtering yielded a shortlist of 54 candidates across all four tools. These results establish a reproducible computational framework for AI-guided peptide design targeting cartilage regeneration and identify specific candidates for future experimental validation via binding assays and chondrocyte differentiation studies. Author summaryDamaged cartilage has limited capacity to heal, and current biological therapies based on bone morphogenetic protein 2 (BMP-2) carry serious safety concerns including ectopic bone formation and inflammation. Short peptides that mimic BMP-2s interaction with its receptor BMPR1A could offer a safer, more targeted alternative, but designing such peptides from scratch is challenging. We used four different artificial intelligence tools--each employing a distinct computational strategy--to generate 192 candidate peptides designed to bind BMPR1A. We then evaluated all candidates using multiple independent computational methods to assess binding quality, energy favorability, and whether each peptide targets the correct site on the receptor. Our analysis identified a shortlist of 54 promising candidates, with a 15-residue peptide from the language model-based tool PepMLM emerging as the top-ranked design. We also found evidence that one tool (BindCraft) may produce peptides that bind BMPR1A at sites different from the natural BMP-2 interface, highlighting the importance of validating not just whether a peptide binds, but where it binds. Our computational framework and candidate peptides provide a foundation for future laboratory testing toward cartilage repair therapies.
del Valle Morales, D.; Romano, G.; Saviana, M.; Nana-Sinkam, P.; Nigita, G.; Acunzo, M.
Show abstract
Tyrosine Kinase inhibitors (TKIs) are widely used as effective chemotherapeutic agents for treating patients with EGFR-mutated NSCLC. Unfortunately, after treatment, patients eventually develop resistance to TKI therapy. The most common resistance mechanism for the TKI Osimertinib is the overexpression of the MET Proto-Oncogene, Receptor Tyrosine Kinase (MET). We previously demonstrated that miR-411-5p A-to-I edited at position 5 (miR-411ed) can directly target MET in A549 and H1299 cells. MiR-411ed in combination with Osimertinib reduced cell proliferation in two TKI resistant EGFR-mutated cell lines: HCC827R and PC9R. MiR-411ed did not downregulate MET expression in HCC827R, suggesting an alternative mechanism for TKI response. In this study, we aim to identify the mechanism of miR-411ed TKI response using a multi-omics approach of RNAseq and protein mass spectrometry. In our cellular model, we identified miR-411ed affected genes independent of MET activity, resulting in 211 genes (RNAseq) and 36 proteins (proteomics). Pathway analysis identified an increase in interferon signaling for RNAseq and combined omics, and a decrease in ERK/MAPK signaling in proteomics. Using the IsoTar target prediction tool, we identified STAT3 as a key regulator and confirmed STAT3 protein downregulation upon transfection with miR-411ed. We further investigated the effect of miR-411ed in vivo, observing a reduction in tumor size with miR-411ed in combination with Osimertinib but not with miR-411ed or Osimertinib treatment alone, confirming the effectiveness of miR-411ed in TKI response.
Gorbenko, I. V.; Scherbakov, D. Y.; Zverintseva, K. M.; Konstantinov, Y. M.
Show abstract
Short Interrupted Repeats Cassettes (SIRC) are recently discovered eukaryotic DNA elements possessing many traits of satellite DNA and mobile genetic elements, and consisted of short direct repeats interspersed with diverse spacer sequences. The SIRC ensemble of individual species is highly heterogenous and cannot be studied using alignment methods. It was found that number of similar SIRC sequences in a given pair of species is in general correlated with their taxonomic distance, and, at the same time, closely related species can possess very diverged SIRC ensembles, which makes SIRC evolutionary pattern closer to mobile genetic element type. The SIRC sequences make up clusters with comparable sequence patterns, that are likely to demonstrate doublet evolutionary model which strongly supports that the SIRC structure is supported by the evolutionary selection. Several SIRC sequences of Arabidopsis were found to be of ancient origin with traceable evolution history as far as to the moss clade. We carried out unbiased detection of SIRC ensembles in 10 plant genomes and found that, despite very high intraspecies heterogeneity, SIRC sets possess strong interspecies phylogenetic signal. Key messageShort Interrupted Repeats Cassettes are elements of ancient origin, and could potentially be used to trace organism history, and to facilitate syntheny and Hi-C analysis.
Cooper, H. B.; Rojas Lopez, K. E.; Schiavinato, D.; Black, M. A.; Gardner, P. P.
Show abstract
Proteins and non-coding RNAs are functional products of the genome that are central for crucial cellular processes. With recent technological advances, researchers can sequence genomes in the thousands and probe numerous genomic activities of many species and conditions. Such studies have identified thousands of potential proteins, RNAs and associated activities. However there are conflicting interpretations of the results and therefore which regions of the genome are "functional". Here we investigate the relative strengths of associations between coding and non-coding gene functionality and genomic features, by comparing reliably annotated functional genes to non-genic regions of the genome. We find that the strongest and most consistent association between functional genes and genomic features are transcriptional activity and evolutionary conservation. We also evaluated sequence-based statistics, genomic repeats, epigenetic and population variation data. Other features strongly associated with function include histone marks, chromatin accessibility, genomic copy-number, and sequence alignment statistics such as coding potential and covariation. We also identify potential issues with SNP annotations in short non-coding RNAs, as some highly conserved ncRNAs have significantly higher than expected SNP densities. Our results demonstrate the importance of evolutionary conservation and transcription activity for indicating protein-coding and non-coding gene function. Both should be taken into consideration when differentiating between functional sequences and biological or experimental noise.